A Bipartite Graph-Based Ranking Approach to Query Subtopics Diversification Focused on Word Embedding Features

نویسندگان

  • Md Zia Ullah
  • Masaki Aono
چکیده

Web search queries are usually vague, ambiguous, or tend to have multiple intents. Users have different search intents while issuing the same query. Understanding the intents through mining subtopics underlying a query has gained much interest in recent years. Query suggestions provided by search engines hold some intents of the original query, however, suggested queries are often noisy and contain a group of alternative queries with similar meaning. Therefore, identifying the subtopics covering possible intents behind a query is a formidable task. Moreover, both the query and subtopics are short in length, it is challenging to estimate the similarity between a pair of short texts and rank them accordingly. In this paper, we propose a method for mining and ranking subtopics where we introduce multiple semantic and content-aware features, a bipartite graphbased ranking (BGR) method, and a similarity function for short texts. Given a query, we aggregate the suggested queries from search engines as candidate subtopics and estimate the relevance of them with the given query based on word embedding and content-aware features by modeling a bipartite graph. To estimate the similarity between two short texts, we propose a Jensen-Shannon divergence based similarity function through the probability distributions of the terms in the top retrieved documents from a search engine. A diversified ranked list of subtopics covering possible intents of a query is assembled by balancing the relevance and novelty. We experimented and evaluated our method on the NTCIR-10 INTENT-2 and NTCIR-12 IMINE-2 subtopic mining test collections. Our proposed method outperforms the baselines, known related methods, and the official participants of the INTENT-2 and IMINE-2 competitions. key words: Subtopic Mining, Query Intent, Diversification, Word Embedding, Bipartite Graph

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Query Subtopic Mining Exploiting Word Embedding for Search Result Diversification

Understanding the users’ search intents through mining query subtopic is a challenging task and a prerequisite step for search diversification. This paper proposes mining query subtopic by exploiting the word embedding and short-text similarity measure. We extract candidate subtopic from multiple sources and introduce a new way of ranking based on a new novelty estimation that faithfully repres...

متن کامل

KDEIM at NTCIR-12 IMine-2 Search Intent Mining Task: Query Understanding through Diversified Ranking of Subtopics

In this paper, we describe our participation in the Query Understanding subtask of the NTCIR-12 IMINE Task. We propose a method that extracts subtopics by leveraging the query suggestions from search engines. The importance of the subtopics with the query is estimated by exploiting multiple query-dependent and query-independent features with supervised feature selection. To diversify the subtop...

متن کامل

Evaluating Ranking Diversity and Summarization in Microblogs using Hashtags

Diversification techniques for web search have recently been developed that assume that, for each query, there is a set of underlying aspects or subtopics that address specific user intents. These techniques attempt to balance the relevance of the retrieved documents with the coverage of the aspects. Evaluation of diversification techniques requires some way of defining a set of aspects for eac...

متن کامل

Diversifying Search Results with Popular Subtopics

This paper describes the method we use in the diversity task of web track in TREC 2009. The problem we aim to solve is the diversification of search results for ambiguous web queries. We present a model based on knowledge of the diversity of query subtopics to generate a diversified ranking for retrieved documents. We expand the original query into several related queries, assuming that query e...

متن کامل

Connected Component Based Word Spotting on Persian Handwritten image documents

Word spotting is to make searchable unindexed image documents by locating word/words in a doc-ument image, given a query word. This problem is challenging, mainly due to the large numberof word classes with very small inter-class and substantial intra-class distances. In this paper, asegmentation-based word spotting method is presented for multi-writer Persian handwritten doc-...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:
  • IEICE Transactions

دوره 99-D  شماره 

صفحات  -

تاریخ انتشار 2016